Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python)!: error on string <-> date cmp #6498

Merged
merged 1 commit into from
Jan 27, 2023

Conversation

ritchie46
Copy link
Member

Yes, not allowing things is a feature. ;)

closes #6468

@github-actions github-actions bot added breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 27, 2023
@ritchie46 ritchie46 merged commit 7950c3c into master Jan 27, 2023
@ritchie46 ritchie46 deleted the err_date_string_cmp branch January 27, 2023 16:03
@mkleinbort-ic
Copy link

Just checking @ritchie46, will this still work?

import polars as pl 
from datetime import datetime 

pl.DataFrame({
    'myDate': [datetime(2020,1,1), datetime(2021,1,1), datetime(2022,1,1), datetime(2023,1,1)]
}).filter(pl.col('myDate')>'2021-06-01')

>>>
shape: (2, 1)
┌─────────────────────┐
│ myDate              │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2022-01-01 00:00:00 │
│ 2023-01-01 00:00:00 │
└─────────────────────┘

@ritchie46
Copy link
Member Author

Just checking @ritchie46, will this still work?

import polars as pl 
from datetime import datetime 

pl.DataFrame({
    'myDate': [datetime(2020,1,1), datetime(2021,1,1), datetime(2022,1,1), datetime(2023,1,1)]
}).filter(pl.col('myDate')>'2021-06-01')

>>>
shape: (2, 1)
┌─────────────────────┐
│ myDate              │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2022-01-01 00:00:00 │
│ 2023-01-01 00:00:00 │
└─────────────────────┘

That never worked as expected. It will cast the date to string and do a string compare. You don't want that l.

@mkleinbort-ic
Copy link

mkleinbort-ic commented Jan 28, 2023 via email

@ritchie46
Copy link
Member Author

ritchie46 commented Jan 28, 2023

Those codebases would have been broken to begin with. Polars already printed a warning that this likely was not what you should do.

Polars is not pandas and we choose to not support this as string as dates are ambigious.

@mkleinbort-ic
Copy link

mkleinbort-ic commented Jan 28, 2023

I can live with that (though I'll be sad).

If I can argue my case, it is not just Pandas - it is the convention in SQL as well:

SELECT * FROM table WHERE myDateCol  >= '2022-01-01'

This is also the case in PySpark and Snowpark among others.

Moreover, it is not clear to me that this:

df.filter(pl.col('myDate') > '2021-06-01')

is any worse than

df.filter(pl.col('myDate') > datetime(2021, 6, 1))

My preference would be that when comparing a pl.Date/pl.DateTime object with a UTF-8 object, the string be cast to a datetime implicitly, and to only raise an error if that fails.

@ritchie46
Copy link
Member Author

Conversions of strings to dates have so many edge cases and gotcha's that this conversion should be explicit. Then you can specify the format, the timezone handling, etc.

This explicitness will prevent bugs downstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparing Date and non-date-like string should fail - it does not
2 participants